Overview

Dataset statistics

Number of variables12
Number of observations1460
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory148.3 KiB
Average record size in memory104.0 B

Variable types

Numeric10
Categorical2

Warnings

GarageArea has 81 (5.5%) zeros Zeros
TotalBsmtSF has 37 (2.5%) zeros Zeros
2ndFlrSF has 829 (56.8%) zeros Zeros

Reproduction

Analysis started2021-02-20 02:32:38.961917
Analysis finished2021-02-20 02:32:54.631610
Duration15.67 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

YearRemodAdd
Real number (ℝ≥0)

Distinct61
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1984.865753
Minimum1950
Maximum2010
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:54.736619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1950
5-th percentile1950
Q11967
median1994
Q32004
95-th percentile2007
Maximum2010
Range60
Interquartile range (IQR)37

Descriptive statistics

Standard deviation20.64540681
Coefficient of variation (CV)0.01040141217
Kurtosis-1.272245192
Mean1984.865753
Median Absolute Deviation (MAD)13
Skewness-0.5035620027
Sum2897904
Variance426.2328223
MonotocityNot monotonic
2021-02-19T21:32:54.856662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1950178
 
12.2%
200697
 
6.6%
200776
 
5.2%
200573
 
5.0%
200462
 
4.2%
200055
 
3.8%
200351
 
3.5%
200248
 
3.3%
200840
 
2.7%
199636
 
2.5%
Other values (51)744
51.0%
ValueCountFrequency (%)
1950178
12.2%
19514
 
0.3%
19525
 
0.3%
195310
 
0.7%
195414
 
1.0%
19559
 
0.6%
195610
 
0.7%
19579
 
0.6%
195815
 
1.0%
195918
 
1.2%
ValueCountFrequency (%)
20106
 
0.4%
200923
 
1.6%
200840
2.7%
200776
5.2%
200697
6.6%
200573
5.0%
200462
4.2%
200351
3.5%
200248
3.3%
200121
 
1.4%

YearBuilt
Real number (ℝ≥0)

Distinct112
Distinct (%)7.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1971.267808
Minimum1872
Maximum2010
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:54.991751image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1872
5-th percentile1916
Q11954
median1973
Q32000
95-th percentile2007
Maximum2010
Range138
Interquartile range (IQR)46

Descriptive statistics

Standard deviation30.20290404
Coefficient of variation (CV)0.01532156307
Kurtosis-0.4395519416
Mean1971.267808
Median Absolute Deviation (MAD)25
Skewness-0.6134611725
Sum2878051
Variance912.2154126
MonotocityNot monotonic
2021-02-19T21:32:55.118105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
200667
 
4.6%
200564
 
4.4%
200454
 
3.7%
200749
 
3.4%
200345
 
3.1%
197633
 
2.3%
197732
 
2.2%
192030
 
2.1%
195926
 
1.8%
199925
 
1.7%
Other values (102)1035
70.9%
ValueCountFrequency (%)
18721
 
0.1%
18751
 
0.1%
18804
 
0.3%
18821
 
0.1%
18852
 
0.1%
18902
 
0.1%
18922
 
0.1%
18931
 
0.1%
18981
 
0.1%
190010
0.7%
ValueCountFrequency (%)
20101
 
0.1%
200918
 
1.2%
200823
 
1.6%
200749
3.4%
200667
4.6%
200564
4.4%
200454
3.7%
200345
3.1%
200223
 
1.6%
200120
 
1.4%

TotRmsAbvGrd
Real number (ℝ≥0)

Distinct12
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.517808219
Minimum2
Maximum14
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:55.231805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4
Q15
median6
Q37
95-th percentile10
Maximum14
Range12
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.625393291
Coefficient of variation (CV)0.2493772808
Kurtosis0.8807615657
Mean6.517808219
Median Absolute Deviation (MAD)1
Skewness0.6763408364
Sum9516
Variance2.641903349
MonotocityNot monotonic
2021-02-19T21:32:55.336572image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
6402
27.5%
7329
22.5%
5275
18.8%
8187
12.8%
497
 
6.6%
975
 
5.1%
1047
 
3.2%
1118
 
1.2%
317
 
1.2%
1211
 
0.8%
Other values (2)2
 
0.1%
ValueCountFrequency (%)
21
 
0.1%
317
 
1.2%
497
 
6.6%
5275
18.8%
6402
27.5%
7329
22.5%
8187
12.8%
975
 
5.1%
1047
 
3.2%
1118
 
1.2%
ValueCountFrequency (%)
141
 
0.1%
1211
 
0.8%
1118
 
1.2%
1047
 
3.2%
975
 
5.1%
8187
12.8%
7329
22.5%
6402
27.5%
5275
18.8%
497
 
6.6%

FullBath
Categorical

Distinct4
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size94.1 KiB
2
768 
1
650 
3
 
33
0
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1460
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row1
5th row2
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%
2021-02-19T21:32:55.581880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T21:32:55.651671image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring characters

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1460
100.0%

Most frequent character per category

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
Common1460
100.0%

Most frequent character per script

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1460
100.0%

Most frequent character per block

ValueCountFrequency (%)
2768
52.6%
1650
44.5%
333
 
2.3%
09
 
0.6%

1stFlrSF
Real number (ℝ≥0)

Distinct753
Distinct (%)51.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1162.626712
Minimum334
Maximum4692
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:55.761610image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile672.95
Q1882
median1087
Q31391.25
95-th percentile1831.25
Maximum4692
Range4358
Interquartile range (IQR)509.25

Descriptive statistics

Standard deviation386.587738
Coefficient of variation (CV)0.3325123481
Kurtosis5.745841482
Mean1162.626712
Median Absolute Deviation (MAD)234.5
Skewness1.376756622
Sum1697435
Variance149450.0792
MonotocityNot monotonic
2021-02-19T21:32:55.871577image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86425
 
1.7%
104016
 
1.1%
91214
 
1.0%
84812
 
0.8%
89412
 
0.8%
67211
 
0.8%
8169
 
0.6%
6309
 
0.6%
9367
 
0.5%
9607
 
0.5%
Other values (743)1338
91.6%
ValueCountFrequency (%)
3341
 
0.1%
3721
 
0.1%
4381
 
0.1%
4801
 
0.1%
4837
0.5%
4951
 
0.1%
5205
0.3%
5251
 
0.1%
5261
 
0.1%
5361
 
0.1%
ValueCountFrequency (%)
46921
0.1%
32281
0.1%
31381
0.1%
28981
0.1%
26331
0.1%
25241
0.1%
25151
0.1%
24441
0.1%
24111
0.1%
24021
0.1%

GarageArea
Real number (ℝ≥0)

ZEROS

Distinct441
Distinct (%)30.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean472.980137
Minimum0
Maximum1418
Zeros81
Zeros (%)5.5%
Memory size22.8 KiB
2021-02-19T21:32:55.996584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q1334.5
median480
Q3576
95-th percentile850.1
Maximum1418
Range1418
Interquartile range (IQR)241.5

Descriptive statistics

Standard deviation213.8048415
Coefficient of variation (CV)0.452037675
Kurtosis0.9170672023
Mean472.980137
Median Absolute Deviation (MAD)120
Skewness0.1799809067
Sum690551
Variance45712.51023
MonotocityNot monotonic
2021-02-19T21:32:56.116605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
081
 
5.5%
44049
 
3.4%
57647
 
3.2%
24038
 
2.6%
48434
 
2.3%
52833
 
2.3%
28827
 
1.8%
40025
 
1.7%
48024
 
1.6%
26424
 
1.6%
Other values (431)1078
73.8%
ValueCountFrequency (%)
081
5.5%
1602
 
0.1%
1641
 
0.1%
1809
 
0.6%
1861
 
0.1%
1891
 
0.1%
1921
 
0.1%
1981
 
0.1%
2004
 
0.3%
2053
 
0.2%
ValueCountFrequency (%)
14181
0.1%
13901
0.1%
13561
0.1%
12481
0.1%
12201
0.1%
11661
0.1%
11341
0.1%
10691
0.1%
10531
0.1%
10522
0.1%

TotalBsmtSF
Real number (ℝ≥0)

ZEROS

Distinct721
Distinct (%)49.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1057.429452
Minimum0
Maximum6110
Zeros37
Zeros (%)2.5%
Memory size22.8 KiB
2021-02-19T21:32:56.246594image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile519.3
Q1795.75
median991.5
Q31298.25
95-th percentile1753
Maximum6110
Range6110
Interquartile range (IQR)502.5

Descriptive statistics

Standard deviation438.7053245
Coefficient of variation (CV)0.4148790481
Kurtosis13.25048328
Mean1057.429452
Median Absolute Deviation (MAD)234.5
Skewness1.524254549
Sum1543847
Variance192462.3617
MonotocityNot monotonic
2021-02-19T21:32:56.361658image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
037
 
2.5%
86435
 
2.4%
67217
 
1.2%
91215
 
1.0%
104014
 
1.0%
81613
 
0.9%
72812
 
0.8%
76812
 
0.8%
84811
 
0.8%
78011
 
0.8%
Other values (711)1283
87.9%
ValueCountFrequency (%)
037
2.5%
1051
 
0.1%
1901
 
0.1%
2643
 
0.2%
2701
 
0.1%
2901
 
0.1%
3191
 
0.1%
3601
 
0.1%
3721
 
0.1%
3847
 
0.5%
ValueCountFrequency (%)
61101
0.1%
32061
0.1%
32001
0.1%
31381
0.1%
30941
0.1%
26331
0.1%
25241
0.1%
24441
0.1%
23961
0.1%
23921
0.1%

GarageCars
Categorical

Distinct5
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size94.1 KiB
2
824 
1
369 
3
181 
0
 
81
4
 
5

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters1460
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row3
5th row3
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%
2021-02-19T21:32:56.611856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-02-19T21:32:56.681660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring characters

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1460
100.0%

Most frequent character per category

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common1460
100.0%

Most frequent character per script

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1460
100.0%

Most frequent character per block

ValueCountFrequency (%)
2824
56.4%
1369
25.3%
3181
 
12.4%
081
 
5.5%
45
 
0.3%

2ndFlrSF
Real number (ℝ≥0)

ZEROS

Distinct417
Distinct (%)28.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean346.9924658
Minimum0
Maximum2065
Zeros829
Zeros (%)56.8%
Memory size22.8 KiB
2021-02-19T21:32:56.811738image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q3728
95-th percentile1141.05
Maximum2065
Range2065
Interquartile range (IQR)728

Descriptive statistics

Standard deviation436.5284359
Coefficient of variation (CV)1.258034335
Kurtosis-0.5534635576
Mean346.9924658
Median Absolute Deviation (MAD)0
Skewness0.8130298163
Sum506609
Variance190557.0753
MonotocityNot monotonic
2021-02-19T21:32:56.931733image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0829
56.8%
72810
 
0.7%
5049
 
0.6%
6728
 
0.5%
5468
 
0.5%
7207
 
0.5%
6007
 
0.5%
8966
 
0.4%
7805
 
0.3%
8625
 
0.3%
Other values (407)566
38.8%
ValueCountFrequency (%)
0829
56.8%
1101
 
0.1%
1671
 
0.1%
1921
 
0.1%
2081
 
0.1%
2131
 
0.1%
2201
 
0.1%
2241
 
0.1%
2402
 
0.1%
2522
 
0.1%
ValueCountFrequency (%)
20651
0.1%
18721
0.1%
18181
0.1%
17961
0.1%
16111
0.1%
15891
0.1%
15401
0.1%
15381
0.1%
15231
0.1%
15191
0.1%

GrLivArea
Real number (ℝ≥0)

Distinct861
Distinct (%)59.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1515.463699
Minimum334
Maximum5642
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:57.051711image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile848
Q11129.5
median1464
Q31776.75
95-th percentile2466.1
Maximum5642
Range5308
Interquartile range (IQR)647.25

Descriptive statistics

Standard deviation525.4803834
Coefficient of variation (CV)0.3467456092
Kurtosis4.895120581
Mean1515.463699
Median Absolute Deviation (MAD)326
Skewness1.366560356
Sum2212577
Variance276129.6334
MonotocityNot monotonic
2021-02-19T21:32:57.176733image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86422
 
1.5%
104014
 
1.0%
89411
 
0.8%
84810
 
0.7%
145610
 
0.7%
9129
 
0.6%
12009
 
0.6%
8168
 
0.5%
10928
 
0.5%
13447
 
0.5%
Other values (851)1352
92.6%
ValueCountFrequency (%)
3341
 
0.1%
4381
 
0.1%
4801
 
0.1%
5201
 
0.1%
6051
 
0.1%
6161
 
0.1%
6306
0.4%
6722
 
0.1%
6911
 
0.1%
6931
 
0.1%
ValueCountFrequency (%)
56421
0.1%
46761
0.1%
44761
0.1%
43161
0.1%
36271
0.1%
36081
0.1%
34931
0.1%
34471
0.1%
33951
0.1%
32791
0.1%

OverallQual
Real number (ℝ≥0)

Distinct10
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.099315068
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:57.281621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q15
median6
Q37
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.382996547
Coefficient of variation (CV)0.2267462053
Kurtosis0.09629277836
Mean6.099315068
Median Absolute Deviation (MAD)1
Skewness0.2169439278
Sum8905
Variance1.912679448
MonotocityNot monotonic
2021-02-19T21:32:57.371557image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
5397
27.2%
6374
25.6%
7319
21.8%
8168
11.5%
4116
 
7.9%
943
 
2.9%
320
 
1.4%
1018
 
1.2%
23
 
0.2%
12
 
0.1%
ValueCountFrequency (%)
12
 
0.1%
23
 
0.2%
320
 
1.4%
4116
 
7.9%
5397
27.2%
6374
25.6%
7319
21.8%
8168
11.5%
943
 
2.9%
1018
 
1.2%
ValueCountFrequency (%)
1018
 
1.2%
943
 
2.9%
8168
11.5%
7319
21.8%
6374
25.6%
5397
27.2%
4116
 
7.9%
320
 
1.4%
23
 
0.2%
12
 
0.1%

SalePrice
Real number (ℝ≥0)

Distinct663
Distinct (%)45.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180921.1959
Minimum34900
Maximum755000
Zeros0
Zeros (%)0.0%
Memory size22.8 KiB
2021-02-19T21:32:57.486542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum34900
5-th percentile88000
Q1129975
median163000
Q3214000
95-th percentile326100
Maximum755000
Range720100
Interquartile range (IQR)84025

Descriptive statistics

Standard deviation79442.50288
Coefficient of variation (CV)0.4391000319
Kurtosis6.53628186
Mean180921.1959
Median Absolute Deviation (MAD)38000
Skewness1.88287576
Sum264144946
Variance6311111264
MonotocityNot monotonic
2021-02-19T21:32:57.606992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
14000020
 
1.4%
13500017
 
1.2%
14500014
 
1.0%
15500014
 
1.0%
19000013
 
0.9%
11000013
 
0.9%
16000012
 
0.8%
11500012
 
0.8%
13900011
 
0.8%
13000011
 
0.8%
Other values (653)1323
90.6%
ValueCountFrequency (%)
349001
0.1%
353111
0.1%
379001
0.1%
393001
0.1%
400001
0.1%
520001
0.1%
525001
0.1%
550002
0.1%
559931
0.1%
585001
0.1%
ValueCountFrequency (%)
7550001
0.1%
7450001
0.1%
6250001
0.1%
6116571
0.1%
5829331
0.1%
5565811
0.1%
5550001
0.1%
5380001
0.1%
5018371
0.1%
4850001
0.1%

Interactions

2021-02-19T21:32:43.236771image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:43.382009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:43.506740image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:43.766845image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:43.892079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.016800image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.131985image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.251697image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.366932image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.501695image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.631943image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.751676image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.861616image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:44.986603image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.106961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.220849image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.331432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.441361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.567328image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.686810image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.806822image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:45.921831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.046946image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.176530image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.291597image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.406553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.519805image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.641506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.751774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.856471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:46.961416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.071852image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.176780image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.276944image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.374986image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.471813image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.586814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.716392image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.842529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:47.964037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.076431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.196633image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.316624image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.431668image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.551649image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.681637image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.806746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:48.936851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.061863image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.171878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.296922image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.411881image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.521785image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.641790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.761746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.871788image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:49.991814image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.101749image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.211713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.336807image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.461371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.571851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.681850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.811449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:50.926450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.041625image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.151824image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.261562image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.371368image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.661744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.771242image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:51.886731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.011612image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.131469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.246428image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.361390image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.461793image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.571703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.681657image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.786605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:52.886537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.001453image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.131485image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.261505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.391568image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.526450image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.665656image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.796406image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:53.921413image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-02-19T21:32:54.051459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-02-19T21:32:57.731703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-02-19T21:32:57.946622image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-02-19T21:32:58.171483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-02-19T21:32:58.381709image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-02-19T21:32:58.801623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-02-19T21:32:54.265672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-02-19T21:32:54.521681image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

YearRemodAddYearBuiltTotRmsAbvGrdFullBath1stFlrSFGarageAreaTotalBsmtSFGarageCars2ndFlrSFGrLivAreaOverallQualSalePrice
02003200382856548856285417107208500
11976197662126246012622012626181500
22002200162920608920286617867223500
31970191571961642756375617177140000
42000200092114583611453105321988250000
51995199351796480796256613625143000
62005200472169463616862016948307000
7197319737211074841107298320907200000
819501931821022468952275217747129900
9195019395110772059911010775118000

Last rows

YearRemodAddYearBuiltTotRmsAbvGrdFullBath1stFlrSFGarageAreaTotalBsmtSFGarageCars2ndFlrSFGrLivAreaOverallQualSalePrice
145019741974828960896089617925136000
14512009200872157884015733015788287090
1452200520055110725255472010725145000
14532006200661114001140001140584500
14542005200462122140012212012217185000
14552000199972953460953269416476175000
14561988197872207350015422020736210000
14572006194192118825211521115223407266500
14581996195051107824010781010785142125
14591965196561125627612561012565147500